-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove the specialness from GPU requests #1489
Conversation
✅ Deploy Preview for karpenter-docs-prod canceled. 🔨 Explore the source changes: 908be5e 🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/6238cd1e12a6f200081a3611 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! It is nice to get the AWS stuff out of the core code!
6560c2e
to
1e148d7
Compare
@@ -58,6 +60,9 @@ const ( | |||
|
|||
func init() { | |||
v1alpha5.NormalizedLabels = functional.UnionStringMaps(v1alpha5.NormalizedLabels, map[string]string{"topology.ebs.csi.aws.com/zone": v1.LabelTopologyZone}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man it would be sweet to register NormalizedLabels in the same way!
85cd89a
to
2f3a0fe
Compare
dfc4d26
to
0a91fdd
Compare
} | ||
} | ||
|
||
// These are meant to give some relative weighting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment makes me uneasy as a reader. They're meant to do X, but do they? Is it accurate to be more concrete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should just remove that comment entirely. Trying to indicate that it's not a real "price", but people should be able to figure that out easily enough.
pkg/cloudprovider/registry.go
Outdated
// creation unless a pod specifically requests this resource type. This is useful for preventing non-GPU workloads | ||
// from possibly scheduling to more expensive GPU instance type, or from causing a GPU instance type to scale up to | ||
// the next larger type due to a non-GPU workload | ||
ResourceFlagMinimizeUsage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep thinking about this complexity. I'm wondering if it's enough to purely delineate on WellKnown vs NotWellKnown resource types? If we know about it (e.g., CPU, Memory, EphemeralStorage) then we can use the ResourceFlagNone
behavior. If we don't know about it, we can use ResourceFlagMinimizeUsage
behavior. This will enable us to simply not configure any of this (for now).
My guess is that we will need to be more sophisticated with binpacking in the future, but I don't think we have enough use cases to have clarity on what the parameters are. Given that with currently known resource types, the above suggestion is equivalent in implementation.
Happy to move forward in either direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a longer comment below, but I'm starting to think this complexity isn't worth it since kube-scheduler won't follow this logic and we're left with using taints/tolerations again which are already built-in.
1. Issue, if available:
N/A
2. Description of changes:
This modifies Karpenter to treat resource requests identically. It
removes the special logic for some AWS specific GPU types and
allows CloudProvider implementers to provide their own custom
resource types along with an ordering to be used as a hint for
binpacking regarding which instance types to prefer.
3. How was this change tested?
Unit tests & deploying GPU/non-GPU workloads. Quota limits prevented GPU instances from being
created, but I could see the requests.
4. Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.